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We consider a simplified model of protein folding, with binary degrees of freedom, whose equilib- 
rium thermodynamics is exactly solvable. Based on this exact solution, the kinetics is studied in 
the framework of a local equilibrium approach, for which we prove that (i) the free energy decreases 
with time, (ii) the exact equilibrium is recovered in the infinite time limit, and (iii) the folding rate 
is an upper bound of the exact one. The kinetics is compared to the exact one for a small peptide 
and to Monte Carlo simulations for a longer protein, then rates are studied for a real protein and a 
model structure. 
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Many experimental findings on the folding of small pro- 
teins suggest a strong relationship between the structure 
of the native state (the functional state of the protein) 
and the folding kinetics (see e.g. 0,0 and refs. therein), 
and several theoretical models have been proposed which 
aim to elucidate the protein folding kinetics on the basis 
of this relationship. Among these, three studies appeared 
in 1999 in the same PNAS issue 11,110, all of them based 
on models with binary (ordered/disordered) degrees of 
freedom associated to each aminoacid or peptide bond 
(the bond between consecutive aminoacids). The third 
of these works 5] is particularly interesting because it is 
based on a model with remarkable mathematical proper- 
ties which make it possible to obtain exact results. It is 
a one-dimensional model, with long-range, many-body 
interactions, where a binary variable is associated to each 
peptide bond. Two aminoacids can interact only if they 
are in contact in the native state and all the peptide 
bonds between them are in the ordered state. Moreover 
an entropic cost is associated to each ordered bond. 

A homogeneous version of the model was first intro- 
duced in 1978 by Wako and Saito 0, 0, who solved 
exactly the equilibrium thermodynamics. The full het- 
erogeneous case was later considered by Munoz, Eaton 
and coworkers jl, 1 who introduced the single (dou- 
ble, triple) sequence approximations, i.e. they consid- 
ered only configurations with at most one (two, three) 
stretches of consecutive ordered bonds, for both the equi- 
librium and the kinetics. They applied the model to the 
folding of a 16-aminoacid /3-hairpin 0,0, and to a set 
of 22 proteins 0. Th e eq uilibrium problem has been 
subsequently studied in [lOj , with exact solutions for ho- 
mogeneous /3-hairpin and a-helix structures, mean field 
approximation and Monte Carlo simulations. The exact 
solution for the equilibrium in the full heterogeneous case 
was given in Moreover, in it was shown that 

the equilibrium probability has an important factoriza- 
tion property, which implies the exactness of the clus- 
ter variation method (CVM) [H Q llij |. a variational 
method for the study of lattice systems in statistical me- 
chanics. Recently, the model has been used to study the 



kinetics of the photoactive yellow protein [id llTj and 
the free energy profiles and the folding rates of a set of 
25 two-state proteins ^Mi- It is also interestin g th at the 
WSME model, and the technique developed in 11], have 
found an application in a problem of strained epitaxy 

The ultimate purpose of this class of models being the 
study of the kinetics, under the assumption that it is 
mainly determined by the structure of the native state, it 
is worth asking whether the exact solution developed for 
the equilibrium can be extended to the kinetics. Strictly 
speaking, in the general case an exact solution for the 
kinetics can not be achieved. Nevertheless, thanks to 
the factorization property proved in i it is possible to 
devise a local equilibrium approach for the kinetics which 
can be proved to yield the exact equilibrium state in the 
infinite time limit. It is the aim of the present Letter to 
illustrate this approach and its properties, and to discuss 
its accuracy and its possible applications. 

The WSME model describes a protein of iV + 1 
aminoacids as a chain of N peptide bonds (connecting 
consecutive aminoacids) that can live in two states (na- 
tive and unfolded) and can interact only if they are in 
contact in the native structure and if all bonds in the 
chain between them are native. To each bond is associ- 
ated a binary variable rrii, i G {!,..., N}, with values 0, 1 
for unfolded and native state respectively. The effective 
free energy of the model (sometimes improperly referred 
to as Hamiltonian) reads 

N-l N 3 N 

2—1 j—i+l k—i i=l 

(1) 

where R is the gas constant and T the absolute temper- 
ature. The first term assigns an energy eij < to the 
contact (defined as in 0, [3|) between bonds i and j if 
this takes place in the native structure (A^ j- = 1 in this 
case and A^j — otherwise). The second term repre- 
sents the entropic cost > of ordering bond i. 

A crucial s tep in the exact solution of the equilibrium 
problem |lll Il2l | is a mapping onto a two-dimensional 
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model through the introduction of the variables Xij = 
Y[k=i "^fc which satisfy the short-range constraints Xij = 
Xij-iXi+ij ioi 1 < i < j < N. These can be associated 
to the nodes of a triangular shaped portion A of a two- 
dimensional square lattice, defined by A = S : 
1 < i < j < N}. Let Ca be the set of all configurations 
x on A that fulfil previous constraints and rewrite the 
effective free energy (divided by RT and leaving apart 
an additive constant) in the form 

HAix) = ^ hi^jXij. (2) 

The corresponding Boltzmann distribution, which will be 
denoted by p%, has been shown 12] to factor as 

n[p^(^")r°- (3) 

aeA 

Here ^ is a set of local clusters a C A made of all square 
plaquettes (aa = 1), the triangles lying on the diagonal 
boundary = 1) and their intersections, that is inter- 
nal nearest-neighbour pairs (oq = — 1) and single nodes 
(tta = 1). For each cluster a G ^ we denote by Xa (2:A\a) 
the projection of x onto a (A \ a), by Ca the set of all 
configurations on a that are projections of configurations 
on A, and define the cluster probability as the marginal 
distribution 

Po(^«)=EpaW- (4) 

As a consequence of Eq. Q , the equilibrium problem can 
be solved exactly |1^J by means of the CVM. Let V\ be 
the set of all cluster probabilities p relative to A satisfy- 
ing the compatibility conditions Pi3{xi3) = ^^Pa{xa) 
for a, 13 € A and P C a. Since the Boltzmann distri- 
bution minimizes the free energy and factors, restricting 
the variational principle to distributions with the same 
property one finds that p'^ e I?a is the minimum of the 
variational free energy 

Pa[p] = E ^aa[\npa{Xa) + ha{Xa)]Pa{Xa) (5) 

with respect to p G I?a- Here ha are defined by 
ha{xa) = J2{i,j)ea^i,j^i,3 ^^^^ it foUows that HAix) = 
^^g^ aQ,/iQ(xQ). This variational approach is not the 
most efficient way to solve the equilibrium problem, 
which is handled in by the transfer matrix method. 
Nevertheless it is a good starting point for a very accurate 
treatment of the kinetics. 

Our kinetic problem will be formulated in the frame- 
work of a master equation approach. Denoting by 
Wa [x' — > x) the transition probability per unit time from 
the state x' to x ^ x' , we have to solve 

^M^)= E Wa{x' x)pi{x') (6) 



where Wa{x — > x) is such that X^^gCa ^a(^' ^ x) = 0. 
If the principle of detailed balance holds, i.e. Wa{x' 
x)p\{x') = Wa{x — > x')p\{x), and Wa is irreducible, 
then the expected equilibrium is reached. 

The above problem is in general not exactly solvable 
and in order to overcome this difficulty, we shall assume 
local equilibrium ^^sX is we shall assume that, 

provided the initial condition p\ factors according to Eq. 
© as the equilibrium probability, the solution p\ of the 
master equation factors in the same way at any subse- 
quent time. With this simplification the master equation 
yields lUH^, for the cluster probabilities, 

j/aixc) = J2 ^"(^' ^ n (7) 

where 

Wa{x' ^ Xa) ^ J2 ^ 

a:A\c« 

One can show jij |2^ that the above evolution preserves 
the compatibility conditions between the cluster proba- 
bilities and that the free energy is not increasing in time, 

|j^a[p*]<0 (9) 

(equality holding if and only if — p'^). Since p'^ mini- 
mizes the free energy, it follows by Lyapunov's theorem 
that the exact equilibrium probability is recovered in the 
infinite time limit, 

lim p* = p^ (10) 

t — *+oc 

It is important to stress that in previous approximations 
for the kinetics the above condition was not satisfied, 
and the behaviour of the free energy was not discussed. 

Moreover, denoting by —k {k > 0) the largest eigen- 
value of the jacobian matrix of the r.h.s. in j?)) evaluated 
at equilibrium, we have that ^ for all e Va there 
exist functions i?^ defined on Cq , V a G A, having a finite 
limit for t ^ +oo and such that 

pi=p'a + e~'''Rl (11) 

which allows to define the equilibration rate as fc. It can 
also be shown 22] that this approximate equilibration 
rate is an upper bound of the exact one, which can be 
intuitively understood by observing that the local equi- 
librium assumption implies that we are dealing with an 
evolution in a restricted probability space. 

It is also important to observe that, since the cluster 
probabilities can be written |0 as linear functions of the 
expectation values 

e.,. W = {X^M) = E n [P'M]"" (12) 

xeCA aeA 
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FIG. 1: Probability of /3-hairpin bond i being native, versus 
time. Solid lines: our results, dashed line: exact results. 



FIG. 2: Probability of a-helix structures in headpiece sub- 
domain being native, versus time. Dashed lines: our results, 
solid lines: Monte Carlo simulations. 



(the probability of the stretch from i to j being native) 
Eq. 121 can be rewritten as 



dt 



(13) 



with a suitable function /, and the kinetic problem is 
complete with an initial condition ^(0). 

A couple of important remarks are in order here. First 
of all, for the single "bond-flip" kinetics which will be 
used in the following, Eq. H13|l turns out to be of polyno- 
mial complexity and our approach yields a reduction of 
computational complexity which makes the kinetic prob- 
lem tractable. This might not be true for other, more 
complicated, choices of the transition probability, due to 
the summation in Eq. ((SJ). In addition, we observe that 
the whole a ppr oach can be reformulated in a discrete time 
framework |22l l25j . 

In order to assess the accuracy of our approach, we 
have applied it to the kinetics of the 16 residues C- 
terminal /3-hairpin of streptococcal protein G Bl 0, 0, 
irH - For such a simple system it is easy to compute the 
exact time evolution of ^, denoted by which we com- 
pare to our results. Parameters for the effective free en- 
ergy are taken from T is set to 290 K, the initial 
condition is the equilibrium state at infinite temperature 
and the transition probability is specified by 



(14) 



if X and x' differ by exactly one bond, that is by the 
value of a single rm variable. Here r is a microscopic 

time scale , Wh{x x) = '^Ylx'^x'^^i^ ~^ ^'): ^^'^ 
Wa^x — > x') = if x and x' differ by more than one 
bond. 

In Fig. ^ we report our results for the behaviour of 
the probability ^i^t of bond i being native, as a function 
of time. We do not report curves for every value of i, 
since the curves for i = 5 to 12 (corresponding to most of 
the hydrophobic core) are almost indistinguishable on the 



graph scale, and in addition ^15,15 and ^14,14 are almost 
indistinguishable from ^1^1 and ^2.2 respectively (which 
follows from the symmetry of the hairpin). A more quan- 
titative measure of the accuracy of our approximation is 



max max \fi , 

tGR+ (i,i)eA 



it) 



(15) 



which takes the value 1.6 • 10~^, which is attained for 
ihj) = (5,12) and t ~ 260 r. Equilibration rates are 
also easily computed in our approach. In the case of Fig. 
^we obtain k ~ 3.93 • lO^'^ t~^, while the exact value is 
3.72-10-3 r-i. 

In order to test our approach on a longer protein, we 
have considered the headpiece subdomain of the F-actin- 
binding protein villin (pdb code IVII) and compared our 
solution with Monte Carlo simulations (in this case the 
exact solution is not feasible). The headpiece subdomain 
contains three a-helices going from bond 3 to 7, form 
bond 14 to 17 and from bond 22 to 31. Fig. El shows 
the probabilities ^3^7, ^14.17 and ^22,31 versus time at the 
temperature of 300 K. Parameters for the effective free 
energy are taken as in 0, |^, the energy scale is cho- 
sen in order to have X^i^i = 1/2 at equilibrium at 
the experimental transition temperature T = 343 K (28j . 
The agreement is still remarkably good and similar re- 
sults are obtained for proteins BBL and CI2 (pdb codes 
IBBL and ICOA respectively). 

Fig. 13 reports the equilibration rates as a function of 
temperature for the WW domain of protein PINl (pdb 
code 1I6C) computed using two different choices for the 
transition probability: Glauber kinetics, corresponding 
to Eq. (|14|l and Metropolis kinetics, where Eq. H14|l is 
replaced by 



Wa{x ^ x') ^ T-'^ exp[HA{x) 



Ha{x)] (16) 

if Ha{x') > Ha{x) and W\( x x') = t^^ otherwise. 
Here qi are chosen as in jU, Cij as in 0, 0| and the 
energy scale in order to have the transition at the exper- 
imental temperature of 332 K jS^. It can be seen that 
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FIG. 3: PINl equilibration rate versus inverse temperature. 
Solid lines; Glauber kinetics, dashed line: Metropolis kinetics. 
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FIG. 4: Equilibration rate for a /3-slieet of four strands versus 
absolute contact order. Dots: our results, line: exponential 
fit. 

the detailed choice of the kinetics affects only marginally 
the behaviour of the equilibration rates, which is in very 
goo d qualitative agreement with the experimental results 
[30|. The same behaviour is obtained for protein CI2. 

Finally, given the observed correlation between exper- 
imental equilibration rates and structural characteristics 
like the absolute contact order AGO = ^i<j J ~ 
i + 1) 31] {Nc is the total number of contacts, j — i + 1 
the distance in sequence between aminoacids involved in 
contact we have computed the rates predicted by 

our approach for a simple model structure, an antiparal- 
lel /3-sheet made of 4 strands, varying the length r of the 
strands. In this case the AGO is equal to r, while the rel- 
ative contact order is a constant. We have used Glauber 
kinetics, qi = 2 and eij/{RT) independent of and 
such that jj'^iLi(,i,i — 1/2 at equilibrium. Results are 
reported in Fig. 01 which shows an almost perfect linear 
correlation between logfc and the AGO. 

In conclusion, we can say that the local equilibrium 
approach for the kinetics of the WSME model gives very 
accurate results with respect to the exact ones. It allows 
to compute equilibration rates, and hence to explore the 
relationship between kinetics and structure of the native 



state. The detailed evolution is also available, which can 
be very useful to study folding pathways. Work is in 
progress on several two-state folders and on the effect of 
mutations. Details of a few mathematical proofs will be 
given in |2^. 

We are grateful for many fruitful discussions with Picr- 
paolo Bruscolini. 
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